This report aims to discover the linkages between motor vehicle accidents (MVA incidents) and other potential factors. The ultimate goal is to resolve the current safety concerns with the information obtained. The data allows the exploration of various features, including depots, gender, age, tenure, weather, time and drivers’ workload. Data visualisation enables the initial understanding of the trend and pattern between the features of interest and the occurrence of incidents. Further, statistical tests clarify whether the selected feature is of sufficient weight on the output. Instead of making predictions on the incident, the purpose of modeling here is to convey the latent effect on incidents. Integrating the findings with interpretation, several recommendations are raised for optimising drivers’ and passengers’ safety.
Overall, the results for the chosen variables are unanimous, they are all associated with the occurrence of incidents. Although depots have not been formally tested with statistical methods, we are able to observe the difference in incidents by region. Crowded areas are observed with higher driver incident rates. Albeit the discrepancy between gender is significant with the greater likelihood for male drivers to get into an incident, there are more details to be discussed. Not many young drivers have been recorded with an incident. This is arguable when referring to their tenure with only a few trips driven, compared with other age groups. The eldest age group are the stable drivers with fewer incidents. Temperature and precipitation are proven to affect incidents, yet the impact is not as apparent as expected. Time is highlighted in the analysis as more incidents are found during peak traffic hours. Finally, drivers’ workload with incidents are related at daytime and night, drivers are not suggested to take long shifts.
Public transport has been deemed the response to environmental pollution and road safety issues. The ideal picture is that the more people on public transport, the less the traffic volume and pollution, the better the road safety (Mackie, 2008). Indeed, it brings about a good balance on both problems with its widely witnessed benefits. For instance, the reform on traffic policy by only allowing buses and taxis for accessing some areas in Ljubljana, Slovenia, has substantially cut down 72% of Black Carbon pollution (Titos, et al., 2015). Regarding safety, injuries due to accidents have reduced by 24% since the introduction of night buses in Israel (Sadot, 2019). Looking from an external view, although it appears as a win-win solution, it is not. Internally, public transport is exposed to risks anywhere and anytime that could put itself in danger, thus cannot be neglected. Its high capacity of passengers suggests the heavier burden faced.
Statistics of bus accidents vary from research to research. A downward trend of fatalities caused by bus accidents emerges from 2009 to 2014, yet it is not an enduring one with a growth found afterward (Samerei et al., 2021). On the other hand, statistics from Hughes (2021) show a significant drop in the number of fatalities from 2017 to 2018, followed by a rise in 2019. This kind of inconsistency might have resulted from the different data and analysis methods used, thus suggesting the need to conduct analyses with the data in hand. The decreased usage of buses in these two years, the times full of unknown, has intensified the importance for understanding the causes of bus accidents. Apart from passengers’ potential shift towards private vehicles to avoid the crowd in the bus, the public transport industry is anticipated to either grow or fall as the pandemic abates (IBISWorld, 2020). Therefore, it is crucial to examine the bus operation now to minimise safety concerns for the near future.
Numerous factors induce bus accidents, researchers that have performed relevant analyses mainly focus on dimensions encompassing drivers, vehicles and environments. Drivers’ gender, age and behaviour are counted towards drivers’ characteristics and are frequently discussed with their driving performance. Vehicle-specific causes take into account factors, e.g. the age and type of vehicles, which could affect drivers’ control over the bus. Environments relate to external factors, e.g. speed limits, road design and lighting facilities, that could contribute to bus accidents. The extent to which these variables are analysed depends on the available information in the data, however, any additional, related ones found will be inspected.
This report surrounds the five proposed questions to uncover the risks carried by the identified factors through data manipulation. Initially, we assess the weights of the selected variables, depot, gender, age and drivers’ tenure, on the occurrences of MVA incidents (see appendix). Data visualisation produces the overview, which serves to depict the existing patterns. Multiple statistical tests are applied to see whether a difference within the groups of a certain variable would generate different results for incidents. Subsequently, visualisation facilitates in understanding the patterns of the MVA incidents in day and time order, e.g. day of the week, the hour of the day and day/night. In this report, the weather condition is based on temperature and precipitation. Due to the possible bias for certain weather with more or less data, statistical tests help to determine its significance on MVA incidents. A random forest is built in the last question to judge the importance of different influencing factors to the incidents. It helps to reveal linkages between variables and their impacts on the outcome. Finally, recommendations are made based on the analysis results.
The promotion of public transportation, despite being a common vision among nations, still is encountering road safety issues with the growing bus accidents in some countries. In emerging countries, e.g. Nepal and India, drivers and driving behaviour had been the leading cause of accidents, followed by external factors and vehicle conditions (Pearce & Maunder, 2000). Although as a developed country, investigations on the common factors mentioned above and the need to discover more relevant ones is necessary for Australia.
Truong and Currie (2019) assess the effectiveness of riding public transportation as a means of overcoming road safety issues. Types of transportation are individually measured in terms of their relative impacts. The CAR model takes into account factors, e.g. the age of the population in the sampled area, the speed limits and more, to study the causation between the environment and crashes. The ultimate result proves buses to be the safest transportation. A percentage point increase in the proportion of riding buses could reduce 5.7. Further, factors relevant to buses - signalised intersections, public transport stops and roads with speed limits above 100km/h are positively associated with crashes.
Samerei, et al. (2021) follow up to consider the related factors’ relationship with the fatality due to bus accidents in Australia. Clustering analysis and association rules are applied respectively so that simultaneous or conditional events of factors can be revealed. From the clustering analysis, the largest proportion of total crashes are occupied by collisions with motor vehicles on weekdays (55%). Weekends are of great danger, accumulating the most fatality and serious injury, compared with other clusters. From the analysis of association rules, on weekdays, roads with speed limits over 50 are more likely to have bus collisions with motor vehicles than those without. Old, male drivers are associated with a 1.98 increase in cases except colliding with motor vehicles on weekdays. On weekends, the existence of pedestrians on highways is of high risk with 15.35 times of fatality increase. Additionally, darkness is more likely to incur the fatality of bus accidents in Australia.
In contrast, from a preventive perspective, Porcu, et al. (2019) establish a risk index methodology to examine bus safety in Cagliari, Italy. The probability, severity of bus accidents and exposing factors are incorporated into the risk assessment index. Unlike Truong and Currie (2019), relevant variables are not explicitly specified in the function, however are categorised under six broad factors: infrastructure, driver, vehicle, etc. Based on model selection, it draws to the conclusion that in Cagliari, medium, short sized buses traveling on roads with two to three lanes with bus priority are conditions that could result in a low-risk index. Different from the results in Australia (Samerei et al., 2021), night is of less danger in Cagliari. On the opposite, drivers of standard-sized buses driving on roads with wide sidewalks and neighbourhood roads should be more aware as these factors tend to raise the risk level. Winter is also found to be more dangerous.
Unlike countries e.g. the US and the UK, the bus operation in Malaysia has not been well managed over years with the growing bus accidents in the intercity. Law, et al. (2017) adopt a similar idea as Porcu et al. (2019) by creating Safety Performance Index (SPI) to evaluate the risk based on the attributable factors. Through the aggregation and classification of the factors, road environment condition accounts for the most detrimental cause of bus accidents. Two-lane roads, narrow road shoulders and nighttime trips are positively associated with the most risk. From the data exploration, factors e.g. using mobile phones when driving, drivers exceeding the speed limits and the low quality of tires, are identified to be hazardous.
Bus priority policy is focused by Goh, et al. (2014) to discover the weight it has placed on resolving bus accident issues in Melbourne. Prediction models are built with the potential factors as the inputs and the corresponding parameters. The better-performing model acknowledges bus priority as a useful approach in maintaining road safety with a reduction of 53.3% in accident frequency. Further, it reveals the tendency for accidents to occur more often as traffic volume, route length and service frequency increase. Again, it verifies the positive correlation between bus stop density and accident frequency, as found in Truong and Currie (2019).
In brief, bus stop density and route length especially highways are the common factors addressed across Australian-based articles that more attention should be given to in response to bus safety. From the different regions discussed, bus priority plays a significant role in reducing environmental inference for bus drivers. The effect of darkness varies between countries, yet is contributing towards bus accidents in Australia.
The data used in this report are all obtained from Transdev listed as follows:
Driver_Info.csv consists of the personal information with employment records for drivers across Melbourne, New South Wales (NSW), Queensland and Western Australia (WA). It is the main source for analysing drivers’ characteristics with its inclusion of variables, e.g. Depot, Gender, Age and Tenure. When modeling for testing the significance, they are treated as the explanatory variables individually.
Incidents.csv records every incident across NSW, Melbourne and WA. It is the most widely used data set throughout the entire analysis. Observations are narrowed down to those in which the ClaimCategoryName include “MVA” and “motor” so the majority of outputs align with our analysis interest - MVA incidents. For the three interactive plots, incidents are classified according to their severity (see appendix). Incidents.csv often joins with other data sets by building connections with relevant variables, e.g. depot, shift times, weather and more, to assess to what extent the variables are correlated. Incidents are often converted into a binomial form, e.g. whether an incident occurred on a specified hour, thus being regarded as the response variable.
melbourneweather_20150101_20210428.csv and sydneyweather_20150101_20210415 are the weather-based data for Melbourne and Sydney. The hourly measured precipitation (mm) and temperature (°C) form the key for the weather. Despite being positioned as the response variable for statistical tests in section 4, it is not interpreted as whether the occurrence of incidents determines the rainfall. Rather, it summarises the association between rainfall and incidents.
NSWshifts.csv contains the bus shift information for Sydney. 30 minutes of buffer time is added before and after the shift starting and ending times as some incidents occurred just right before or after the shifts. Every shift is recorded as soon as the trip starts and ends, so a lot of shifts are noted with very short timeframes. Therefore, we aggregate the shift starting and ending times daily for each driver for clearer analyses. The starting and ending times are very useful for the random forest. Drivers’ driving hour is derived by joining with Incidents.csv to examine drivers’ workload regarding the probability of incidents.
All drivers are first grouped according to their corresponding depots. The proportion is calculated by the number of drivers that have MVA incidents divided by total drivers in each depot.
Out of 23 depots, only 4 don’t have an incident. Drivers of depots located in NSW and Melbourne have recorded more than half the MVA accidents. For NSW, Revesby and Taren Point are the depots with the highest driver incident rates. Despite that NSW has the most depots with incidents, drivers with depots in Melbourne are involved with the most incidents. Doncaster, North-Fitzroy, Sunshine West and Thomastown are noted with the most danger.
The discrepancy exists between states, the driver incident rates for depots in WA are much lower, compared with NSW and Melbourne. This may lie in the differences in population. Even though a very small portion of drivers in WA involved had incidents, still, 75% of drivers in Joondalup depot have encountered MVA incidents.
To evaluate the relationship between the severity of accidents in different depots and the driver incident rates, we divided them into “serious” and “slight” according to incident types. The results for serious incidents are pretty close to that for the MVAs with a smaller scale. Narrowing down to slight incidents, only a few depots in NSW and one in WA appear. Again, the driver incident rates for Revesby and Taren Point depots had exceeded 50% for slight incidents. The comparison with the previous graph implies that the majority of MVA incidents drivers encountered in depots in Melbourne and WA are serious.
The absence number for Queensland is due to the lack of data in the data set. Gender differences exist in the data. In NSW and Melbourne, drivers’ gender ratio is around 10 to 15 males per female, while approximately 6 in WA. Although male drivers’ incident rates exceed the females in the three states, the numbers are only of slight differences. The ratio of MVA incidents between genders is nearly 1:1. In brief, the proportion of female drivers having incidents is unexpectedly high.
Classified with incident severity, the figure presents similar results to the MVAs. Both male and female drivers in NSW and males in WA contribute towards slight incidents.
Male drivers’ incident rates are higher than females by 19%. The graphs above also suggest male drivers had been exposed to more incidents. Due to the imbalance number in drivers’ gender, it could be unfair to conclude merely from the frequency distribution, two-sample t-test comes into place to determine if our assumption is based on random chance.
| Gender | no incident | incident | total | prop |
|---|---|---|---|---|
| Female | 225 | 151 | 376 | 40.16 |
| Male | 1566 | 2298 | 3864 | 59.47 |
\(H_0: \text{Gender and incidents are independent. There is no relationship between them.}\) \(H_1: \text{Gender and incidents are dependent. Male drivers are more likely to encounter incidents than females.}\)
The p-value of 0 allows us to reject the null hypothesis and confirms gender’s impact on incidents. Male drivers are indeed more likely to have incidents.
| estimate1 | estimate2 | statistic | p.value | parameter | conf.low | conf.high | method | alternative |
|---|---|---|---|---|---|---|---|---|
| 0.4015957 | 0.5947205 | 51.59346 | 0 | 1 | -0.2464959 | -0.1397536 | 2-sample test for equality of proportions with continuity correction | two.sided |
All drivers are grouped by their corresponding age group. The plot presents that of the total drivers within each age group, the proportion of drivers having recorded MVA incidents. Neither the youngest nor the eldest drivers are the most dangerous. The 31-40 age group has the highest driver incident rate. As the age groups go up until 51-60, the number starts reducing.
The outcome for serious incidents is roughly the same as that for the MVAs, except on a lower scale. The driver incident rates for slight ones exhibit a left-skewed distribution, referring to the tendency towards the older age groups. The proportion grows until 61-70 as the peak, following a decline afterward.
From the graphs above, it is hard to determine the association between age and incidents as neither positive or negative trend presents. ANOVA test takes the 20-30 age group as the baseline to allow comparisons with the means of other age groups. The p-values are small, confirming the significance of the 6 age groups and the association with incidents.
##
## Call:
## glm(formula = binary_outcome ~ age_group, family = "binomial",
## data = bin_age)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.0189 0.5284 0.5378 0.5759 1.6651
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.3904 0.1582 2.468 0.01359 *
## age_group31-40 1.3223 0.1692 7.817 5.41e-15 ***
## age_group41-50 1.1605 0.1678 6.916 4.66e-12 ***
## age_group51-60 1.4702 0.1660 8.858 < 2e-16 ***
## age_group61-70 1.5078 0.1671 9.022 < 2e-16 ***
## age_group71-80 1.0620 0.1841 5.768 8.04e-09 ***
## age_group81-90 -1.4890 0.5401 -2.757 0.00583 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 10023.7 on 11724 degrees of freedom
## Residual deviance: 9890.9 on 11718 degrees of freedom
## AIC: 9904.9
##
## Number of Fisher Scoring iterations: 4
Tenure is derived from the interval between an employee’s starting day and today in years, if the driver is currently in Transdev. For those who have quit, their tenure is the interval between the starting and the employment termination dates.
Based on the graph below, a positive trend exists, the driver incident rate grows with the increase in driving experiences. This might be misleading, when a driver takes more shifts, his or her exposure to the risks would increase. The result is not surprising that those with 41-60 years of experience had all been involved in incidents.
The outcome overturns when breaking down the total employed years into yearly form, it records the specific year where a driver encounters incidents and aggregates them by years. The previous positive trend now becomes a positively skewed distribution where the tenure and incidents appear to be negatively correlated. Most incidents are recorded within drivers’ first three years of experience. Strictly speaking, the first ten years of drivers’ employment holds a great portion of incidents among the entire count which is worth to be discussed.
Binomial logistic regression assists to verify the relationship observed in the graph above. Obtaining 0 for p-value suggests that tenure is a significant variable, which illustrates the dependency of incidents on tenure. The negative estimate infers that an increase in the years of experiences reduces the likelihood of running into an incident.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -2.1468690 | 0.0148696 | -144.379839 | 0 |
| num_tenure | -0.0132815 | 0.0016730 | -7.938845 | 0 |
Monday is the peak with the most incidents, whereas Sunday is the trough. The number of incidents that occurred on weekdays is more even and exceeds those on weekends by half. This can be linked to the heavy usage of private vehicles and public transport for commuting, together resulting in the busy traffic.
The hourly graphs illustrate the earlier statement regarding peak traffic hours. 7-8 am and 3-5 pm, are found with the most incidents on weekdays. On weekends, the spike is around 3 pm. The variation in incident number has been much narrower on weekends. Nevertheless, a common point between weekdays and weekends is that incidents tend to reduce in the evening.
The timeframe for daytime is from 5 am to before 7 pm, while the rest is nighttime. The number of incidents recorded during the daytime is roughly 9 times more than those at the night. However, this may be linked to the timetable of bus shifts.
Light rain is defined with precipitation less than 2.5 mm (Wikipedia, n.d.), so is set as the baseline to distinguish between little/no rain and rain. The hourly rainfall distributions for Melbourne and NSW are plotted with and without MVA incidents. They share a similar shape with both peaks around 3 mm, which is not considered heavy rain. The downward-sloping curve for Melbourne isn’t as smooth as NSW. For Melbourne, incidents aggregate closely with 4.5 mm of precipitation. Additionally, its tail towards the right end is slightly higher than NSW. This reveals that rain has more effects on incidents in Melbourne.
The distributions for Melbourne and NSW are approximately the same when there is little/no rain. The only difference observed is the taller peak for incidents in NSW, more incidents occurred in NSW under conditions with little/no rain.
The following outputs are the summary of precipitation with and without incidents, respectively. The overall distributions lean towards little/no rain in both cases, having 0s dominate the 6 numbers. The only two differences lie in the mean and maximum. The mean rainfall is slightly higher when incidents occur, however, the maximum is lower. It is not reasonable to say that incidents have an impact on precipitation, rather, we discover whether an association exists between them.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.00000 0.05546 0.00000 9.40000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.0484 0.0000 11.0000
The non-normal precipitation distributions with and without incidents fail to meet the criteria for the t-test, even with the added log scale.
\(H_0: \text{The precipitation averages for incidents and non-incidents are the same.}\) \(H_1: \text{The precipitation averages for incidents and non-incidents differ.}\)
The nonparametric test corrects the continuity and confirms the effect of precipitation on incidents with a small p-value.
##
## Wilcoxon rank sum test with continuity correction
##
## data: Precipitation by has_incident
## W = 522922546, p-value = 0.0008991
## alternative hypothesis: true location shift is not equal to 0
Similarly, the hourly measured temperature is plotted in distributions with and without the occurrence of MVA incidents. The shapes of incidents follow closely after the non-incidents for Melbourne and NSW where incidents are positioned towards the higher temperature. The peak of incidents is 15 degrees for Melbourne, whereas around 20 degrees for NSW. A large portion concentrates between 10 and 25 degrees in Melbourne and 15 to 25 degrees in NSW. Although on a smaller scale, the distribution for Melbourne is relatively right-skewed, compared with NSW. This indicates the common occurrence of incidents on hotter days in Melbourne.
The summary outputs below explain the temperature distribution for conditions with and without incident occurrence. All values for incidents, except maximum, are larger than those for the non-incidents. It is not suitable to state that the temperature tends to be higher when incidents occur. Yet this is an initial observation that incidents are possibly associated with times of higher temperature.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.79 13.50 18.17 18.63 22.93 45.87
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.28 11.79 16.03 16.57 20.57 46.63
Both temperature distributions for incidents and non-incidents seem approximately normally distributed, thus can be proceeded to two-sample t-test for significance testing.
\(H_0: \text{The temperature averages for incidents and non-incidents are equal.}\) \(H_1: \text{The temperature averages for incidents and non-incidents differ.}\)
The p-value from this test is again less than the threshold. Therefore, the null hypothesis is rejected, an association genuinely exists between temperature and incidents. Specifically, the “mean in group No” and “mean in group Yes” prove the statistical difference with a higher mean value for incidents.
##
## Welch Two Sample t-test
##
## data: Temperature by has_incident
## t = -28.735, df = 12354, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.200039 -1.919052
## sample estimates:
## mean in group No mean in group Yes
## 16.57065 18.63019
The graph above plots the distribution of drivers’ driving hours against the hour they ran into MVA incidents. The trends and patterns are inconsistent from day to night time. During the daytime, a large portion of incidents occurred within the first 3 hours of driving which is the peak. Driving hours between 7 to 10 hours are also found to have accumulated many incidents. The pattern shifts to the later hours for nighttime, with 7 to 9 hours as the peak. Additionally, many incidents are recorded around 9 hours of day drives and 12 hours of night drives.
Drivers’ workload is represented by the number of hours they have been driving within a day. Drivers’ shifts dataset is expanded to a longer form so that each hour of driving can be noted. Further, variables measured hourly, e.g. temperature and precipitation, are suitable to be added to this newly established data set. Although the effects of temperature and precipitation on incidents have been verified earlier, they are now compared with other variables. Thus, a random forest is built to disclose the importance of the four features - hours of driving, the hour of the day, temperature and precipitation, to the occurrence of incidents.
The response variable of 0 indicates no incident recorded during the given driving hour in the day and time, while 1 stands for the occurrence of an incident. Due to the imbalance between incidents and non-incidents which could lead to biased prediction, up-sampling is implemented to replicate data for the minority class to minimise the discrepancy in the results.
##
## Call:
## randomForest(formula = incident_indicator ~ shift_hour_of_day + shift_hour_for_driver + Temperature + Precipitation, data = up_train, importance = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 19.35%
## Confusion matrix:
## 0 1 class.error
## 0 27377 12617 0.31547232
## 1 2864 37130 0.07161074
## 0 1 MeanDecreaseAccuracy MeanDecreaseGini
## shift_hour_of_day -17.73132 197.7358 240.1236 2609.732
## shift_hour_for_driver -15.10100 180.5437 223.1298 3136.903
## Temperature -11.60708 152.7067 150.2549 10848.697
## Precipitation 16.30465 288.8519 287.9000 1045.711
Mean decrease accuracy assesses how much the model accuracy will decrease if that variable is dropped, while mean decrease Gini measures the variable importance based on Gini impurity index (Bhalla, n.d.). The outcomes are reversed for all features. Precipitation holds the greatest importance for mean decrease accuracy, yet it becomes the least important variable according to mean decrease Gini. On the opposite, temperature, albeit weighs the least for mean decrease accuracy, scores the highest importance for mean decrease Gini. The positions for the hour of the day and driver’s driving hour also overturn.
We focus on the interactions between variables that are presumably relevant.
In the interaction plot for the time of the day and drivers’ driving hours, some segments provide meaningful insights. The most apparent pattern that appears in the corner indicates the greater likelihood of incidents for drivers who had driven only for less than 5 hours from 5 to 10 am. In the afternoon, between 2 pm and 6 pm, drivers’ who had driven for 7 hours or more within a day are noted with more dangers of being involved in an incident.
No clear pattern is shown in the interaction plot between temperature and precipitation, even though they are both associated with incidents.
Of the variables investigated in this analysis, all of them are associated with the occurrence of MVA incidents. Although depot has not proceeded to t-test, the gaps in between depots are displayed. Noteworthily, a large portion of total MVA incidents occurred in more populated states, e.g. NSW and Melbourne. This is foreseeable as much research conducted globally acknowledges the positive correlation between population and bus accidents (Bentama, et al., 2017). The unexpectedly high driver incident rate in Joondalup seems odd. Due to the high number of drivers recorded with slight incidents, we assume drivers in NSW, especially Revesby and Taren Point, might have been driving with less care or have more exposure to hazards.
Despite the two-sample t-test expects male drivers to be more likely to encounter an incident, the graphs reveal the unproportionate difference in incidents between male and female drivers. This suggests that there might be other complex factors within female drivers which have drawn to this result.
To some extent, drivers’ age and tenure are related. An old driver is not necessarily experienced. The statement that the youngest driver group (20-30) has been driving more safely is only partially true. These drivers might have been new to Transdev with less than 10 years of experience in bus driving, which is noted with the high occurrence of incidents. We can be sure that elder drivers’ driving performance is stable due to the low driver incident rates with the increase in tenure. However, they should be mindful of running into slight incidents.
Driving at night is commonly assumed with higher exposure to drink-driving, night visibility and fatigue effects (Keall, et al., 2005). It is also polarised based on the findings from the literature review. Our findings claim the high risks of driving in the daytime in NSW, especially during peak traffic hours. Even drivers who had only been driving for less than 5 hours struggled to handle the busy traffic then. The driving performance for more than 8 hours drives is not ideal, regardless of day or night. The possible factors mentioned above regarding nighttime driving should be considered because drivers who had already driven for 7 hours in the night are recorded with the high number of incidents.
Although temperature and precipitation both have an impact on incidents, temperature’s impact is assumed to be stronger. This is observed from the summary statistics where the majority of numbers for rainfall are all less than the rainfall standard for rain (2.5 mm). However, drivers in Melbourne have been more likely to have incidents under heavy rain and high temperature than in Sydney.
Utilise tracking technology or system to track every trip so that common driving behaviours and places associated with danger can be spotted. This is suggested for drivers based in populated states - NSW and Melbourne in response to the high driver incident rates.
Request feedback from drivers at regular intervals to gather more safety-related information from the trips as traffic is a constantly changing field. Distribute the valuable information so that drivers are of greater control over the drives, thereby protecting themselves and the passengers.
Familiarise drivers with the potential road hazards, especially for the populated area.
Specifically investigate drivers’ performances in Joondalup, Revesby and Taren Point depots.
Inspect the factors that influence female drivers’ driving performance, e.g. physical and mental health status, assigned routes and treatment. Discuss with them often to ensure they are capable of taking the assigned shifts.
Provide extra training for newly joined drivers with less than 10 years of driving experience. Avoid assigning them to routes with high exposure to hazards or peak traffic hours (7-8 am and 3-5 pm on weekdays and 3 pm on weekends) at first. Assign them to depots with no drivers recorded with incidents for training, e.g. Pyrmont, Randwick, Capalaba and CATS Perth.
Regularly check with mid-age and elder drivers to ensure they are mindful of the slight incidents.
Arrange sufficient breaks for drivers who had driven for 5 hours in the daytime and have shifts in the afternoon. Possibly allocate another driver for the afternoon shift, rather than keeping the same driver driving too long.
Ensure drivers get enough rest before every night drive. Try to adjust the shift timetable so that drivers won’t be driving for over 7 hours at night. If the shift lasts longer than 5 hours, assign two drivers so that the other one can take over to avoid the fatigue effect or concentration issues.
Check the vehicle condition regularly, specifically in hot days and heavy rain to ensure the drivers are in full control of the vehicles, thereby minimising the chance of being affected by these external factors.
”Safety first.” - Transdev
Incidents are prevalent, reducing them is an everyday task. Throughout the analyses, we discover the association of incidents with gender, age, tenure, weather (temperature and precipitation), time and driving hours. The regional differences are also disclosed between depots. Transdev can look into more characteristics in these variables to find out what causes male drivers to incur more incidents while investigating the factors affecting females’ driving performance. Peak hours on weekdays and afternoons on weekends are different, nonetheless, both require drivers’ additional attention. Drivers’ workload should always be considered corresponding to their well-being so as to lower the risks of motor vehicle accidents.
| TypeOfClaimName |
|---|
| MVA - Hit tree/ tree branch |
| MVA - Hit barrier/ bollard |
| MVA - Reversing |
| MVA - Hit by Third Party |
| Mirror damage |
| MVA - Hit parked vehicle |
| MVA - Hit moving vehicle while driving straight |
| MVA - Hit moving vehicle while turning |
| MVA - Hit stationary vehicle while turning |
| MVA - Hit stationary vehicle driving straight |
| MVA - Hit illegally parked vehicle |
| MVA bus stop sign |
| MVA - Hit moving vehicle while changing lanes |
| MVA bus stop pole |
| MVA - Hit bus station |
| Pedestrian Injury - Hit by Transdev vehicle |
| Customer Injury- Other |
| Employee Injury - Motor Vehicle Accident |
| Customer Injury- Rear door strike |
| Pedestrian Injury |
| Collision with Object |
| Alleged - MVC |
| Motor Vehicle Collision |
| Collision with Public |
| Vandalism - Graffiti |
| Vandalism - Other |
| Other - Miscellaneous |
| MVC with Service Disruption |
| Mechanical Failure |
| Rock Throwing |
| DO NOT USE Near Miss – Other |
| Unsafe Acts - Employee |
| Employee Injury |
| DO NOT USE MVA – Near Miss |
| Near Miss |
| Employee Illness |
| Uncontrolled Movement |
| Physical Assault to Employee |
| Hazard Report |
| Anti-Social Behaviour |
| Third Party Injury |
| Customer Property Damage |
| Near Miss - Other |
| Breaking glass |
| Projectile - Other |
| Bus Rollaway |
| MVA - Near Miss |
| Near Miss - 3rd Party at Fault |
| Road rage |
| Near Miss - Transdev at Fault |
| Pedestrian/Passenger behaviour |
| Abusive behaviour |
| Customer Injury - Other |
| Customer Related - Other |
| Scratching/defacing |
| TypeOfClaimName |
|---|
| MVA - Hit by Third Party |
| MVA - Hit parked vehicle |
| Customer Injury- Heavy Braking |
| MVA - Hit moving vehicle while driving straight |
| MVA - Hit moving vehicle while turning |
| MVA - Hit stationary vehicle while turning |
| MVA - Hit stationary vehicle driving straight |
| MVA - Hit illegally parked vehicle |
| MVA - Hit moving vehicle while changing lanes |
| Pedestrian Injury - Hit by Transdev vehicle |
| MVA - Hit bus station |
| Employee Injury - Motor Vehicle Accident |
| Motor Vehicle Collision |
| Collision with Public |
| TypeOfClaimName |
|---|
| MVA - Hit tree/ tree branch |
| MVA - Hit barrier/ bollard |
| MVA - Near Miss |
| MVA - Reversing |
| MVA bus stop sign |
| MVA bus stop pole |
Alboukadel Kassambara (2021). rstatix: Pipe-Friendly Framework for Basic Statistical Tests. R package version 0.7.0. https://CRAN.R-project.org/package=rstatix
Aleksandra Paluszynska, Przemyslaw Biecek and Yue Jiang (2020). randomForestExplainer: Explaining and Visualizing Random Forests in Terms of Variable Importance. R package version 0.10.1. https://CRAN.R-project.org/package=randomForestExplainer
A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18–22.
Bentama, A., Khatory, A., & Millot, M. (2017). Spatial analysis of bus accidents in France. 2017 International Colloquium on Logistics and Supply Chain Management (LOGISTIQUA). 124-128. https://doi.org/10.1109/LOGISTIQUA.2017.7962885
Bhalla, D. (n.d.). A complete guide to random forest in R. Listen Data. https://www.listendata.com/2014/11/random-forest-with-r.html
Bob Rudis (2020). hrbrthemes: Additional Themes, Theme Components and Utilities for ‘ggplot2’. R package version 0.8.0. https://CRAN.R-project.org/package=hrbrthemes
Claus O. Wilke (2021). ggridges: Ridgeline Plots in ‘ggplot2’. R package version 0.5.3. https://CRAN.R-project.org/package=ggridges
C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida, 2020.
David Robinson, Alex Hayes and Simon Couch (2021). broom: Convert Statistical Objects into Tidy Tibbles. R package version 0.7.5. https://CRAN.R-project.org/package=broom
Dietrich J (2020). citation: Software Citation Tools. R package version 0.4.1.
Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.
Goh, K. C. K., Currie, G., Sarvi, M., & Logan, D. (2014). Bus accident analysis of routes with/without bus priority. Accident Analysis & Prevention. 65, 18-27. https://doi.org/10.1016/j.aap.2013.12.002
Hao Zhu (2021). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.3.4. https://CRAN.R-project.org/package=kableExtra
Hughes, C. (2020). Fatalities involving buses Australia 2010-2019. Statista. https://www.statista.com/statistics/1076120/australia-bus-fatalities/
IBISWorld. (2020, October). Public Transport in Australia - Market Research Report. https://www.ibisworld.com/au/industry/public-transport/1965/
Julia Silge, Fanny Chow, Max Kuhn and Hadley Wickham (2021). rsample: General Resampling Infrastructure. R package version 0.0.9. https://CRAN.R-project.org/package=rsample
Keall, M. D., Frith, W. J., & Patterson, T. L. (2005). The contribution of alcohol to night time crash risk and other risks of night driving. Accident Analysis & Prevention. 37(5), 816-824. https://doi.org/10.1016/j.aap.2005.03.021.
Law, T. H., Daud, M. S., Hamid, H., & Haron, N. A. (2017). Development of safety performance index for intercity buses: An exploratory factor analysis approach. Transport Policy. 58, 46-52. https://doi.org/10.1016/j.tranpol.2017.05.003
Mackie, B. (2008). The benefits of riding the bus. British Columbia Medical Journal. 50(9), 490. https://bcmj.org/presidents-comment/benefits-riding-bus
Max Kuhn and Davis Vaughan (2021). parsnip: A Common API to Modeling and Analysis Functions. R package version 0.1.5. https://CRAN.R-project.org/package=parsnip
Max Kuhn (2020). caret: Classification and Regression Training. R package version 6.0-86. https://CRAN.R-project.org/package=caret
Pearce, T., & Maunder, D. (2000). The causes of bus accidents in five emerging nations. Transport Research Laboratory. http://transport-links.com/wp-content/uploads/2019/11/1_549_PA3574-.pdf
Porcu, F., Olivo, A., Maternini, G., & Barabino, B. (2020). Evaluating bus accident risks in public transport. Transportation Research Procedia. 45, 443-450. https://doi.org/10.1016/j.trpro.2020.03.037
Sadot, S. L. (2019). Can public transportation reduce accidents? Evidence from the introduction of late-night buses in Israeli cities. Regional Science and Urban Economics. 74, 99-117. https://doi.org/10.1016/j.regsciurbeco.2018.11.009
Samerei, S. A., Aghabayk, K., Mohammadi, A., & Shiwakoti, N. (2021). Data mining approach to model bus crash severity in Australia. Journal of Safety Research. 76. 73-82. https://doi.org/10.1016/j.jsr.2020.12.004
Titos, G., Lyamani, H., Drinovec, L., Olmo, F.J., Močnik, G., & Alados-Arboledas, L. (2015). Evaluation of the impact of transportation changes on air quality. Atmospheric Environment. 114, 19-31. https://doi.org/10.1016/j.atmosenv.2015.05.027
Truong, L. T., & Currie, G. (2019). Macroscopic road safety impacts of public transport: A case study of Melbourne, Australia. Accident Analysis & Prevention, 132(105270). https://doi.org/10.1016/j.aap.2019.105270
Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
Wikipedia. (n.d.). Rain. Wikipedia. https://en.wikipedia.org/wiki/Rain